Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?
نویسندگان
چکیده
The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 algorithm.
منابع مشابه
An automatic Method of Finding Topic Boundaries
This article outlines a new method of locating discourse boundaries based on lexical cohesion and a graphical technique called dotplotting. The application of dotplotting to discourse segmentation can be performed either manually, by examining a graph, or automatically, using an optimization algorithm. The results of two experiments involving automatically locating boundaries between a series o...
متن کاملArtificial General Segmentation
We argue that the ability to find meaningful chunks in sequential input is a core cognitive ability for artificial general intelligence, and that the Voting Experts algorithm, which searches for an information theoretic signature of chunks, provides a general implementation of this ability. In support of this claim, we demonstrate that VE successfully finds chunks in a wide variety of domains, ...
متن کاملRecrystallization texture during ECAP processing of ultrafine/nano grained magnesium alloy
An ultrafine/nano grained AZ31 magnesium alloy was produced through four-pass ECAP processing. TEM microscopy indicated that recrystallized regions included nano grains of 75 nm. Pole figures showed that a fiber basal texture with two-pole peaks was developed after four passes, where a basal pole peak lies parallel to the extrusion direction (ED) and the other ~20° away from the transverse dire...
متن کاملReservoir Rock Characterization Using Wavelet Transform and Fractal Dimension
The aim of this study is to characterize and find the location of geological boundaries in different wells across a reservoir. Automatic detection of the geological boundaries can facilitate the matching of the stratigraphic layers in a reservoir and finally can lead to a correct reservoir rock characterization. Nowadays, the well-to-well correlation with the aim of finding the geological l...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کامل